Heuristic dynamic programming with internal goal representation

نویسندگان

  • Zhen Ni
  • Haibo He
چکیده

In this paper, we analyze an internal goal structure based on heuristic dynamic programming, named GrHDP, to tackle the 2-D maze navigation problem. Classical reinforcement learning approaches have been introduced to solve this problem in literature, yet no intermediate reward has been assigned before reaching the final goal. In this paper, we integrated one additional network, namely goal network, into the traditional heuristic dynamic programming (HDP) design to provide the internal reward/goal representation. The architecture of our proposed approach is presented, followed by the simulation of 2-D maze navigation (10*10) problem. For fair comparison, we conduct the same simulation environment settings for the traditional HDP approach. Simulation results show that our proposed GrHDP can obtain faster convergent speed with respect to the sum of square error, and also achieve lower error eventually.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extracting Dynamics Matrix of Alignment Process for a Gimbaled Inertial Navigation System Using Heuristic Dynamic Programming Method

In this paper, with the aim of estimating internal dynamics matrix of a gimbaled Inertial Navigation system (as a discrete Linear system), the discretetime Hamilton-Jacobi-Bellman (HJB) equation for optimal control has been extracted. Heuristic Dynamic Programming algorithm (HDP) for solving equation has been presented and then a neural network approximation for cost function and control input ...

متن کامل

A Heuristic Algorithm for Nonlinear Lexicography Goal Programming with an Efficient Initial Solution

In this paper,  a heuristic algorithm is proposed in order to solve a nonlinear lexicography goal programming (NLGP) by using an efficient initial point. Some numerical experiments showed that the search quality by the proposed heuristic in a multiple objectives problem depends on the initial point features, so in the proposed approach the initial point is retrieved by Data Envelopment Analysis...

متن کامل

A goal programming model for vehicle routing problem with backhauls and soft time windows

The vehicle routing problem with backhauls (VRPB) as an extension of the classical vehicle routing prob-lem (VRP) attempts to define a set of routes which services both linehaul customers whom product are to be delivered and backhaul customers whom goods need to be collected. A primary objective for the problem usually is minimizing the total distribution cost. Most real-life problems have othe...

متن کامل

Data-driven heuristic dynamic programming with virtual reality

In this paper, we propose a virtual reality (VR) platform as a case study of machine learning, in this case applied to the goal representation heuristic dynamic programming (GrHDP) approach. In general, a VR platform normally includes a physical module, a control/learning module, and a VR module. It facilitates machine learning research, where scientists and engineers can participate in the sim...

متن کامل

New scheduling rules for a dynamic flexible flow line problem with sequence-dependent setup times

In the literature, the application of multi-objective dynamic scheduling problem and simple priority rules are widely studied. Although these rules are not efficient enough due to simplicity and lack of general insight, composite dispatching rules have a very suitable performance because they result from experiments. In this paper, a dynamic flexible flow line problem with sequence-dependent se...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Soft Comput.

دوره 17  شماره 

صفحات  -

تاریخ انتشار 2013